Goto

Collaborating Authors

 influence relation


Variance-Reduced Long-Term Rehearsal Learning with Quadratic Programming Reformulation

Neural Information Processing Systems

In machine learning, a critical class of decision-making problems involves Avoiding Undesired Future (AUF): given a predicted undesired outcome, how can one make decision about actions to prevent it? Recently, the rehearsal learning framework has been proposed to address AUF problem. While existing methods offer reliable decisions for single-round success, this paper considers long-term settings that involve coordinating multiple future outcomes, which is often required in real-world tasks. Specifically, we generalize the AUF objective to characterize a long-term decision target that incorporates cross-temporal relations among variables. As directly optimizing the AUF probability PAUF over this objective remains challenging, we derive an explicit expression for the objective and further propose a quadratic programming (QP) reformulation that transforms the intractable probabilistic AUF optimization into a tractable one. Under mild assumptions, we show that solutions to the QP reformulation are equivalent to those of the original AUF optimization, based on which we develop two novel rehearsal learning methods for long-term decision-making: (i) a greedy method that maximizes the single-round PAUF at each step, and (ii) a far-sighted method that accounts for future consequences in each decision, yielding a higher overall PAUF through an L/(L+1) variance reduction in the AUF objective. We further establish an O(1/ N) excess risk bound for decisions based on estimated parameters, ensuring reliable practical applicability with finite data.


Avoiding Undesired Future with Minimal Cost in Non-Stationary Environments

Neural Information Processing Systems

Machine learning (ML) has achieved remarkable success in prediction tasks. In many real-world scenarios, rather than solely predicting an outcome using an ML model, the crucial concern is how to make decisions to prevent the occurrence of undesired outcomes, known as the problem. To this end, a new framework called has been proposed recently, which works effectively in stationary environments by leveraging the influence relations among variables. In real tasks, however, the environments are usually non-stationary, where the influence relations may be, leading to the failure of AUF by the existing method. In this paper, we introduce a novel sequential methodology that effectively updates the estimates of dynamic influence relations, which are crucial for rehearsal learning to prevent undesired outcomes in non-stationary environments. Meanwhile, we take the cost of decision actions into account and provide the formulation of AUF problem with minimal action cost under non-stationarity. We prove that in linear Gaussian cases, the problem can be transformed into the well-studied convex quadratically constrained quadratic program (QCQP). In this way, we establish the first polynomial-time rehearsal-based approach for addressing the AUF problem.




Avoiding Undesired Future with Minimal Cost in Non-Stationary Environments

Neural Information Processing Systems

Machine learning (ML) has achieved remarkable success in prediction tasks. In many real-world scenarios, rather than solely predicting an outcome using an ML model, the crucial concern is how to make decisions to prevent the occurrence of undesired outcomes, known as the avoiding undesired future (AUF) problem. To this end, a new framework called rehearsal learning has been proposed recently, which works effectively in stationary environments by leveraging the influence relations among variables. In real tasks, however, the environments are usually non-stationary, where the influence relations may be dynamic, leading to the failure of AUF by the existing method. In this paper, we introduce a novel sequential methodology that effectively updates the estimates of dynamic influence relations, which are crucial for rehearsal learning to prevent undesired outcomes in non-stationary environments.